Univariate Time Series
2025-07-22
Big picture: Models for Single Time Series
In the case of integer orders of integration, the widely available and simple solution is to take sufficient differences to render the variable stationary. If the levels are not stationary, try changes; if changes are non-stationary, try the change in change. And so on. But differencing has rather pernicious substantive consequences.
When discussing heteroscedasticity, we notice that the off-diagonal elements are all zeroes. This is the assumption of no correlation among [somehow] adjacent elements. The somehow takes two forms: (1) spatial and (2) temporal. Just as before where time-induced heteroscedasticity simply involved interchanging N and T and i and t; the same idea prevails here.
The basic idea: time-consistency. James D. Hamilton Time Series Analysis defines two notions of stationarity.
Strict stationarity: A process is said to be strictly stationary if, for any values of j_{1},j_{2},\ldots,j_{n} the joint distribution of Y_{t},Y_{t+j_{1}},Y_{t+j_{2}},\ldots,Y_{t+j_{n}} depends only on the intervals separating the dates ( j_{1},j_{2},\ldots,j_{n} ) and not on the date itself t.
Weak stationarity: If neither the mean \mu, nor the autocovariances \gamma_{j}, depend on the date t, then the process for Y_{t} is said to be covariance-stationary or weakly stationary. \mathbb{E}[(Y_{t})] = \mu \; \forall \; t \mathbb{E}[(Y_{t} - \mu)(Y_{t-j} - \mu)] = \gamma_{j}\; \forall t\; [any]\; j
Everything is calculated from deviates: y_{i} - \overline{y}
We assume stationarity in so doing. And we do so at our peril.
Let’s examine a simple form of first-order dependence. Let’s suppose that the current observation depend on the immediately prior observation by some elasticity \rho. Let \epsilon_{t} \sim N(0,\sigma_{\epsilon}) This yields:
y_{t} = \alpha + \rho_{1}y_{t-1} + \epsilon_{t}
One sufficient condition for stationarity would be that abs(\rho) < 1 – we will see this again shortly. Why? Suppose it is one.
y_{t} = \alpha + \rho_{1}y_{t-1} + \epsilon_{t} y_{t} - y_{t-1} = \alpha + \epsilon_{t}
The over time difference in y is a constant plus a well-behaved error. This is known as a random walk with drift.
It is worth stopping to think about a drawing. The problem can be approached with the difference in y on the left hand side. But, keeping in mind that \epsilon has zero expectation, this will grow or shrink endlessly by virtue of \alpha. The mean is a function of time for levels of the series.
Lets have a look at a little simulation
As you see, this applies to non-stationary series. But not everything with dependency through time is non-stationary. ARMA process provide the middle ground while the I – the order of integration – in ARIMA relates to required differences for stationarity.
In linear algebra, a Toeplitz matrix has the descending diagonal from left to right as a constant. They have some handy properties but expose some key underlying properties.
\Phi = \sigma^{2}\Psi = \sigma^{2}_{e} \left(\begin{array}{ccccc}1 & \rho_{1} & \rho_{2} & \ldots & \rho_{T-1} \\ \rho_1 & 1 & \rho_1 & \ldots & \rho_{T-2} \\ \rho_{2} & \rho_1 & 1 & \ldots & \rho_{T-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \rho_{T-1} & \rho_{T-2} & \rho_{T-3} & \ldots & 1 \end{array}\right)
This allows us to calculate the variance of e using results from math stats, for a first-order autoregressive process we would have e_{t} = \rho e_{t-1} + \nu_{t} s.t. Var(e_{t}) = \rho^{2}Var(e_{t-1}) + Var(\nu). If the variance is stationary, we can rewrite, \sigma^{2}_{e} = \frac{\sigma^{2}_{\nu}}{1 - \rho^{2}}
Auto Regressive Moving Average (ARMA) structures characterize most time series of interest (virtually all with the inclusion of their seasonal counterparts). In general, we write
Autoregression [AR(p)]: e_{t} = \rho_{1} e_{t-1} + \rho_{2}e_{t-2} + \cdots + \rho_{p} e_{t-p} + \nu_{t}
Moving Average [MA(q)]: e_{t} = \nu_{t} + \theta_{1} \nu_{t-1} + \theta_{2}\nu_{t-2} + \cdots + \theta_{p} \nu_{t-q}
Autoregression and Moving Average [ARMA(p, q)]: e_{t} = \rho_{1} e_{t-1} + \rho_{2}e_{t-2} + \cdots + \rho_{p} e_{t-p} + \nu_{t} + \theta_{1} \nu_{t-1} + \theta_{2}\nu_{t-2} + \cdots + \theta_{q} \nu_{t-q}
[One way of getting a handle on this is to attempt to measure it. From the inimitable Allison Horst….
A new series to introduce the autocorrelation function (ACF) w/ time series data, with special thanks to @robjhyndman for feedback & suggestions! 👾
— Allison Horst (@allison_horst) February 16, 2021
🧵1/9: Meet the monster family. The youngest generation is on the right (that's our host). pic.twitter.com/9iBtV88KfU
Two relevant autocorrelations:
Autocorrelation: \rho_{s} = \frac{\sum^{T}_{t=s+1} (y_{t} - \overline{y})(y_{t-s} - \overline{y})}{\sum_{t=1}^{T}(y_{t} - \overline{y})^{2}}
Partial Autocorrelation \phi_{s} = \frac{\rho_{s} - \sum_{j=1}^{s-1} \phi_{s-1,j}\rho_{s-j}}{1 - \sum_{j=1}^{s-1} \phi_{s-1,j}\rho_{j}}
In ARIMA modeling, these are two critical components as each process has a characteristic signature. An autoregressive process typically exhibits geometric decay in the autocorrelation function and spikes in the partial; moving average processes exhibit the reverse. Nonstationary series decay very slowly (the I in ARIMA).
This is an AR(1) with \rho=0.9.
library(tidyverse)
library(fpp3)
data.frame(y=arima.sim(list(ar=0.9, ma=0), n=100, n.start = 50), x=seq(1:100)) %>% as_tsibble(index=x) %>% gg_tsdisplay(plot_type="partial")
Though these plots were generated in R, we could do the same thing in Stata. For a quick summary with a little graphic, have a look at the corrgram. For (pretty) plots, Stata has two commands to recreate this, ac and pac. The former generates the autocorrelations while the latter creates the partial autocorrelations. We will have a go at this in the lab.
The Dickey-Fuller testing philosophy relies on the following base equation that mirrors our earlier basic presentation of random walks. To obtain a test equation, subtract y_{t-1} from both sides.
y_{t} = \alpha + \beta t + \rho_{1}y_{t-1} + \epsilon_{t} \Delta(y_{t}) = \alpha + \beta t + (\rho_{1}-1)y_{t-1}+ \epsilon_{t}
The \alpha – drift – and \beta – trend – terms are optional depending on the series in question. If and only if \rho - 1 < 0 or \rho < 1 can we reject the claim of nonstationarity.
The critical values are tabulated.
use "https://github.com/robertwwalker/Essex-Data/raw/main/br7983.dta"
tsset
dfuller govpopl
###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -1.574 1.2634
Is unique in having a null hypothesis of [trend/drift/trend and drift] stationary. It is also fairly easy to construct. - Regress y on the chosen option above [trend/drift/trend and drift] and isolate the residual u. - Calculate S_{t}, the partial sum of the residuals: S_{t} = \sum_{i=1}^{t} u_{i} - The KPSS statistic is KPSS = \frac{1}{T^2} \sum{t=1}^{T} \frac{S_{t}}{s_{u^2}} where s_{u^2} is an estimate of the long-run variance (typically done by a Newey-West procedure). - If KPSS is large, reject the claim of *** stationary.
The critical values are tabulated.
H0: y_{t} = a_{0} + y_{t-1} + \mu_{1} D_{P} + \epsilon_{t} so that y_{t} \sim S with a jump
H1: y_{t} = a_{0} + a_{2} t + \mu_{2} D_{L} + \epsilon_{t} so that y_{t} \sim TS with a jump
Pulse break: D_P = 1 \; if \; t = TB + 1 and zero otherwise,
Level break: D_L = 0 \;for \; t = 1, . . . , TB and one otherwise.
If we can reject a range of pathologies, we can justify inference rationally? The integrity of the estimand; does the conditional mean make sense?
- Unit root tests come in a host of forms with nulls of a unit root and nulls of stationarity. The processes have different implications. - Levin and Lin: levinlin with H_{0}: I(1). - Im, Pesaran, and Shin: ipshin with H_{0}: I(1). - KPSS: kpss with H_{0}: I(0). - Fisher: xtfisher works with unbalanced panels - Simple xtreg with lagged y, if \beta_{y_{t-1}} \approx 1 then there is a worry. - Plots (Every structure has different theoretical ACF/PACF) - Durbin-Watson d and Durbins h with endogenous variables - Dickey-Fuller tests and many others. \Delta y_{t} = \rho y_{t-1} + \theta_{L} \Delta y_{t-L} + \lambda_{t} + u_{t} - Breusch-Godfrey test and the like (Fit regression, isolate residuals, regress residual on X and lags of residual, nR^{2} \sim \chi^{2}_p).
Stata has a battery of panel unit-root tests. There are many and they operate under differing sets of assumptions. They generally follow on xtunitroot.
xtunitroot llc): trend nocons (unit specific) demean (within transform) lags. Under (crucial) cross-sectional independence, the test is an advancement on the generic Dickey-Fuller theory that allows the lag lengths to vary by cross-sections. The test relies on specifying a kernel (beyond our purposes) and a lag length (upper bound). The test statistic has a standard normal basis with asymptotics in \frac{\sqrt{N_{T}}}{T} (T grows faster than N). The test is of either all series containing unit roots (H_{0}) or all stationary; this is a limitation. It is recommended for moderate to large T and N.
xtunitroot ht): trend nocons (unit specific) demean (within transform) altt (small sample adjust)
xtunitroot breitung): trend nocons (unit specific) demean (within transform) robust (CSD) lags. \ Similar to LLC with a common statistic across all i.ips): trend demean (within transform) lags. They free \rho to be \rho_{i} and average individual unit root statistics. The null is that all contain unit roots while the alternative specifies at least some to be stationary. The test relies on sequential asymptotics (first T, then N). Better in small samples than LLC, but note the differences in the alternatives.xtunitroot fisher): dfuller pperron demean lags.xtunitroot hadri): trend demean robustAll but the last are null hypothesis unit-root tests. Most assume balance but the fisher and IPS versions can work for unbalanced panels.
Trend: pattern exists when there is a long-term increase or decrease in the data.
Seasonal: pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week).
Cyclic: pattern exists when data exhibit rises and falls that are (duration usually of at least 2 years).
ESSSSDA25-2W: Heterogeneity and Dynamics